Evolution of the social network of scientific collaborations 
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The co-authorship network of scientists represents a prototype of complex evolving networks. In 
addition, it offers one of the most extensive database to date on social networks. By mapping the 
electronic database containing all relevant journals in mathematics and neuro-science for an eight- 
year period (1991-98), we infer the dynamic and the structural mechanisms that govern the evolution 
and topology of this complex system. Three complementary approaches allow us to obtain a detailed 
characterization. First, empirical measurements allow us to uncover the topological measures that 
characterize the network at a given moment, as well as the time evolution of these quantities. 
The results indicate that the network is scale-free, and that the network evolution is governed by 
preferential attachment, affecting both internal and external links. However, in contrast with most 
model predictions the average degree increases in time, and the node separation decreases. Second, 
we propose a simple model that captures the network's time evolution. In some limits the model 
can be solved analytically, predicting a two-regime scaling in agreement with the measurements. 
Third, numerical simulations are used to uncover the behavior of quantities that could not be 
predicted analytically. The combined numerical and analytical results underline the important role 
internal links play in determining the observed scaling behavior and network topology. The results 
and methodologies developed in the context of the co-authorship network could be useful for a 
systematic study of other complex evolving networks as well, such as the world wide web, Internet, 
or other social networks. 
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I. INTRODUCTION 

One of the most prolific mathematicians of all time, 
Paul Erdos has written over 1400 papers with over 500 
co-authors. This unparalleled productivity inspired the 
concept of the Erdos number, which is defined to be one 
for his many co-authors, two for the co-authors of his 
co-authors and so on. The tightly interconnected na- 
ture of the scientific community is reflected by the con- 
jecture that all publishing mathematicians, as well as 
many physicists and economists have rather small Erdos 
numbers Besides the immediate interest for scien- 
tometrics, the co-authorship networks is of general inter- 
est for understanding the topological and dynamical laws 
governing complex networks p[-p7t, as it represents the 
largest publicly available computerized social network. 

Social networks have been much studied in social sci- 
ences 0Jl^] ■ A general feature of these studies is that 
they are restricted to rather small systems, and often 
view networks as static graphs, whose nodes are individ- 
uals and links represent various quantifiable social inter- 
actions. 

In contrast, recent approaches with methodology 
rooted in statistical physics focus on large networks, 
searching for universalities both in the topology of the 
web and in the dynamics governing it's evolution. These 
combined theoretical and empirical results have opened 



unsuspected directions for research and a wealth of appli- 
cations in many fields ranging from computer science to 
biology Three important results seem 

to crystallize in this respect: First, most networks have 
the the so called small world property 19 1, which means 
that the average separation between the nodes is rather 
small, i.e. one can find a short path along the links be- 
tween most pairs of nodes. Second, real networks display 
a degree of clustering higher than expected for random 
networks Finally, it has been found that the de- 

gree distribution contains important information about 
the nature of the network, for many large networks fol- 
lowing a scale-free power-law distribution, inspiring the 
study of scale-free networks . 

In addition to uncovering generic properties of real net- 
works, these studies signal the emergence of a new set of 
modeling tools that considerably enhance our ability to 
characterize and model complex interactive systems. To 
illustrate the power of this these advances we choose to 
investigate in detail the collaboration network of scien- 
tists. 

Recently Newman has taken an important step to- 
wards ap plying modern network ideas to collaboration 
networks J10|Jll] ] . He studied several large database fo- 
cusing on several fields of research over a five year pe- 
riod, establishing that collaboration networks have all the 
general ingredients of small world networks: they have a 
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surprisingly short node-to-node distance and a large clus- 
tering coefficient fto| , much larger than the one expected 
from a random Erdos-Renyi type network of similar size 
and average connectivity. Furthermore, the degree dis- 
tribution appears to follow a power law pM. 

Our study takes a different, but complementary ap- 
proach to collaboration networks than that followed by 
Newman. We view collaboration networks as prototype 
of evolving networks, where the accent is on dynamics 
and evolution. Indeed, the co-authorship network con- 
stantly expands by the addition of new authors to the 
database, as well as the addition of new internal links 
representing papers co-authored by authors that were al- 
ready part of the database. The topological properties of 
these networks are determined by these dynamical and 
growth processes. Consequently, in order to understand 
their topology, we first need to understand the dynam- 
ical process that determines their evolution. In this as- 
pect Newman's study focuses on the static properties of 
the collaboration graph, while our work investigates the 
dynamical properties of these networks. We show that 
such dynamical approach can explain many of the static 
topological features seen in the collaboration graph. 

It is important to emphasize that the properties of the 
co-authorship network are not unique. The WWW is 
also a complex evolving network, where nodes and links 
are added (and removed) at a very high rate, the network 
topology being profoundly determined by these dynami- 
cal features IBeoLE! 



The actor network of Hollywood 
is very similar to the co-authorship network, because it 
grows through the addition of new nodes (actors) and 
new links (movies linking existing actors) [p|]^]l4||. Sim- 
ilarly, the nontrivial scaling properties of many cellular 
|p3f , ecological |Q or business networks are all deter- 
mined by dynamical processes that contributed to the 
emergence of these networks. So why single out the col- 
laboration network as a case study? A number of fac- 
tors have contributed to this choice. First we needed a 
network for which the dynamical evolution is explicitly 
available. That is, in addition to a map of the network 
topology, it is important to know the time at which the 
nodes and links have been added to the network, crucial 
for revealing the network dynamics. This requirement re- 
duces the currently available databases to two systems: 
the actor network, where we can follow the dynamics 
by recording the year of the movie release, and the col- 
laboration network for which the paper publication year 
allows us to track the time evolution. Of these two, the 
co-authorship data is closer to a prototypical evolving 
network than the Hollywood actor database for the fol- 
lowing reasons: in the science collaboration network the 
co-authorship decision is made entirely by the authors, 
i.e. decision making is delegated to the level of individual 
nodes. In contrast, for actors the decision often lies with 
the casting director, a level higher than the node. While 
in the long run this difference is not particularly impor- 
tant, the collaboration network is still closer in spirit to 
a prototypical evolving network such as social systems or 



the WWW. 

Our work stands on three pillars. First, we use direct 
measurements on the available data to uncover the mech- 
anism of network evolution. This implies determining the 
different parameters and uncovering the various compet- 
ing processes present in the system. Second, building 
on the mechanisms and parameters revealed by the mea- 
surements we construct a model that allows us to inves- 
tigate the large scale topology the system, as well as its 
dynamical features. The predictions offered by a contin- 
uum theory of the model allow us to explain some of the 
results that were uncovered by ours, as well Newman's 
measurements. The third and final step will involve com- 
puter simulations of the model, serving several purposes: 
(i) It allows us to investigate quantities that could not 
be extracted from the continuum theory; (ii) Verifies the 
predictions of the continuum theory; (iii) Allows us to un- 
derstand the nature of the measurements we can perform 
on the network, explaining some apparent discrepancies 
between the theoretical and the experimental results. 



II. DATABASES: CO-AUTHORSHIP IN 
MATHEMATICS AND NEURO-SCIENCE 

For each research field whose practitioners collaborate 
in publications one can define a co-authorship network 
which is a reflection of the professional links between the 
scientists. In this network the nodes are the scientists 
and two scientists are linked if they wrote a paper to- 
gether. In order to get information on the topology of a 
scientific co-authorship web one needs a complete dataset 
of the published papers, ideally from the birth of the dis- 
cipline until today. However, computer databases cover 
at most the past several decades. Thus any study of this 
kind needs to be limited to only a recent segment of the 
database. This will impose unexpected challenges, that 
need to be addressed, since such limited data availability 
is a general feature of most networks. 

The databases considered by us contain article titles 
and authors of all relevant journals in the field of mathe- 
matics (M) and neuro-science (NS), published in the pe- 
riod 1991-98. We have chosen these two fields for several 
reasons. A first factor was the size of the database: bio- 
logical sciences or physics are orders of magnitude larger, 
too large to address their properties with reasonable com- 
puting resources. Second, the selected two fields offer 
sufficient diversity by displaying different publishing pat- 
terns: in NS collaboration is intense, while mathematics, 
although there is increasing tendency towards collabora- 
tion Pq| , is still a basically single investigator field. 

In mathematics our database contains 70,975 different 
authors and 70,901 papers for an interval spanning eight 
years. In NS the number of different authors is 209,293 
and the number of published papers is 210,750. A com- 
plete statistics for the two considered database is sum- 
marized in Fig. 1, where we plot the cumulative number 
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of papers and authors for the period 1991-98. We con- 
sider " new author" an author who was not present in the 
database from 1991 up to a given year. 
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FIG. 1. (a) Cumulative number of papers for the M and 
NS databases in the period 1991-98. The inset shows the 
number of papers published each year, (b) Cumulative num- 
ber of authors (nodes) for the M and NS databases in the 
period 1991-98. The inset shows the number of new authors 
added each year. 

Before proceeding we need to clarify a few method- 
ological issues that affect the data analysis. First, in 
the database the authors are represented by their sur- 
name and initials of first and middle name, thus there is 
a source of error in distinguishing some of them. Two 
different authors with the same initials and surname will 
appear to be the same node in the database. This error is 
important mainly for scientists of Chinese and Japanese 
descent. Second, seldom a given author uses one or two 
initials in different publications, and in such cases he/she 
will appear as separate nodes. Newman |l(J] showed that 
the error introduced by those problems is of the order 
of a few percents. Our results are also affected by these 
methodological limitations, but we do not expect that it 
will have a significant impact on our results. 

III. DATA ANALYSIS 

In this section we investigate the topology and dynam- 
ics of the two databases, M and NS. Our goal is to extract 



the parameters that are crucial to the understanding of 
the processes which determine the network topology, of- 
fering input for the construction of an appropriate model. 

A. Degree distribution follows a power-law 

A quantity that has been much studied lately for vari- 
ous networks is the degree distribution, P(k), giving the 
probability that a randomly selected node has k links. 
Networks for which P{k) has a power-law tail, are known 
as scale-free networks [p|]l^]. On the other hand classical 
network models, including the Erdos-Renyi J2l],|28| and 
the Watts and Strogatz [Q] models have an exponentially 
decaying P(k) and are collectively known as exponential 
networks. The degree distributions of both the M and NS 
data indicate that collaboration networks are scale-free. 
The power-law tail is evident from the raw, uniformly 
binned data (Fig. ||a,b), but the scaling regime is better 
seen on the plot that uses logarithmic binning, reducing 
the noise in the tail (Fig. 2c). The cumulative data with 
logarithmic binning indicates 7m = 2.4 and ^ns = 2.1 
for the two databases j?9) . 
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FIG. 2. Degree distribution for the (a) M and (b) NS 
database, showing the data based on the cumulative results up 
to yeas 1993 (x) and 1998 (•). (c) Degree distribution shown 
with logarithmic binning computed from the full dataset cu- 
mulative up to 1998. The lines correspond do the best fits, 
and have the slope 2.1 (NS, dotted) and 2.4 (M, dashed). 

We will see in the coming sections that the data indi- 
cates the existence of two scaling regimes with two dif- 
ferent scaling exponents. The combination of these two 
regimes could easily give the impression of an exponential 
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cutoff in the P(k) for large k. Further analysis, offered 
in sections |v|- |VIl| , indicates that a consideration of two 
scaling regimes offers a more accurate description. 

B. Average separation decreases in time 



asymptotic limit, in which different relevant quantities 
take up a stationary value. The smaller separation for 
the NS field is expected, since mathematicians tend to 
work in smaller groups and write papers with fewer co- 
authors. 



The ability of two nodes, i and j, to communicate with 
each other depends on the length of the shortest path, kj , 
between them. The average of Uj over all pairs of nodes 
is denoted by d = < Uj > , and we call it the average sep- 
aration of the network, characterizing the networks in- 
terconnectedness. Large networks can have surprisingly 
small separation, explaining the origin of the small-world 
concept §jl9|. Deter mining the average separation in 
a large network is a rather time-consuming procedure. 
Usually sampling a fraction of all nodes and determining 
their distance from all other points gives reasonable re- 
sults. The results for the cumulative database are shown 
in Fig. | 




FIG. 3. Average separation in the M and NS databases. 
The separation is computed on the cumulative data up to the 
indicated year. The error bars indicate the standard deviation 
of the distances between all pairs of nodes. 



C. Clustering coefficient decays with time 

An important phenomena characterizing the deviation 
of real networks from the completely random ER model 
is clustering. The clustering coefficient, a quantitative 
measure of this phenomena, C, can be defined as fol- 
lows Q: pick a node, i that has links to ki other nodes 
in the system. If these ki nodes form a fully connected 
clique, there are ki(ki — l)/2 links between them, but 
in reality we find much fever. Let us denote by Ni the 
number of links that connect the selected fc, nodes to 
each other. The clustering coefficient for node i is then 
Ci = 2Ni/ki(ki — 1). The clustering coefficient for the 
whole network is obtained by averaging Ci over all nodes 
in the system, i.e. C = < d In simple terms the 
clustering coefficient of a node in the co-authorship net- 
work tells us how much a node's collaborators are will- 
ing to collaborate with each other, and it represents the 
probability that two of it's collaborators wrote a paper 
together. The clustering coefficient for the cumulative 
network as a function of time is shown in Fig. ^. 



0.9 - 



NS 



u 



M 



0.6 - 



The figure indicates that d decreases with time, which 
is highly surprising because all network models so far 
predict that the average separation should increase with 
system size fl^ffij . The decreasing trend observed by us 
could have two different origins. First, it is possible that 
indeed, the separation does decrease as internal links, i.e. 
papers written by authors that were previously part of 
the database, the increase interconnectivitythus decreas- 
ing the diameter. Second, the decreasing diameter could 
be a consequence of the fact that we have no access to the 
full database, but only starting from year 1991. As we 
demonstrate in sect. |Vl| such incomplete dataset could 
result in an apparently decreasing separation even if oth- 
erwise for the full system the separation increases. 

One can note the slow convergence of the diameter and 
the more connected nature of the NS field expressed by a 
smaller separation. The slow convergence indicates that 
perhaps even longer time interval is needed to reach the 
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FIG. 4. Clustering coefficient of the M and NS database, 
determined for the cumulative data up to the year indicated 
on the t axis. 



The results, in agreement with the separation measure- 
ments, suggest a stronger interconnectedness for the NS 
compared with M, and a slow convergence in time to an 
asymptotic value. 

D. Relative size of the largest cluster increases 

It is important to realize that the collaboration net- 
work is fragmented in many clusters. There are several 
reasons for this. First, in every field there are scientists 
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that do not collaborate at all, that is they are the only au- 
thors of all papers on which their name appears. This is 
more frequent in mathematics, which despite an increas- 
ing tendency toward collaboration ^q] , is still more frag- 
mented than physics or neural science. Second, and most 
important, the database contains papers published only 
after 1990. Thus there is a possibility that two authors 
co-authored a paper before 1990, but in our database 
they appear as disconnected. 

If we look only at a single year, we see many isolated 
clusters of authors. The cumulative dataset containing 
several years develops a giant cluster, that contains a 
large fraction of the authors. To investigate the emer- 
gence of this giant connected component we measured 
the relative size of the largest cluster, r, giving the ra- 
tio between the number of nodes in the largest cluster 
and the total number of nodes in the system. A cluster 
is defined as a subset of nodes interconnected by links. 
Results from our cumulative co-authorship networks are 
presented in Fig. ||. As expected, in M the fraction of 
clustered researchers is considerably smaller than in NS. 



the fact that we are reconstructing the already existing 
giant cluster, and it is only a partial measure of it's emer- 
gence. 

Finally, the fast convergence of the NS cluster size to 
an approximately stationary value around 0.9 indicates 
that after 1994 the network reached a roughly stationary 
topology, i.e. the basic alliances are uncovered. This does 
not seems to be the case for M, where after ten years r 
still increases, perhaps due to smaller publication and 
collaboration rate in the field. 



E. Average degree increases 

With time the number of nodes in our co-authorship 
network increases due to arrival of new authors. The 
total number of links also increases through the connec- 
tions made by new authors with old ones and by new 
connections between old authors. A quantity character- 
izing the network's interconnectedness is the average de- 
gree < k >, giving the average number of links per au- 
thor. The time dependence of < k > for the cumulative 
network is shown in Fig. ^, indicating an approximately 
linear increase of < k > with time. 
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FIG. 5. Relative size of the largest cluster for the M and 
NS database. Results are computed on the cumulative data 
up to the given year. 



The continuous increase in r may appear as the sce- 
nario commonly described as percolation |3(i|| or the much 
studied emergence of the giant component in random net- 
works |2S|] . However, the process leading to this giant 
cluster is fundamentally different from these much stud- 
ied phenomena. In most research fields, apart from a very 
small fraction of authors that do not collaborate, all au- 
thors belong to a single giant cluster from the very early 
stages of the field. That is, the system is almost fully con- 
nected from the very first moment. The only reason why 
the giant cluster in our case grows so dramatically in the 
first several years is that we are missing the information 
on the network topology before 1991. A good example is 
the actor network, where the huge majority of the actors 
are part of the large cluster at any stage of the network, 
starting from early 1900's until today. However, if we 
would start recording collaborations only after 1990 for 
example, the data would indicate, incorrectly, that many 
actors are disconnected. The increasing r indicates only 




FIG. 6. Average number of links per node (< k >) for the 
M and NS database. Results are computed on the cumulative 
data up to the given year. 



This is a rather important deviation from the major- 
ity of currently existing evolving network models, that 
assume a constant < k > as the network expands. As 
expected, the average degree for M is much smaller than 
forNS. 



F. Node selection is governed by preferential 
attachment 

Classical network models assume complete random- 
ness, i.e. the nodes are connected to each other inde- 
pendent of the the number of links they already had 
P, p7|j28| . The discovery of the power-law connectivity 
distribution required the development of new modeling 
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paradigms. A much used assumption is that in scale- 
free networks nodes link with higher probability to those 
nodes that already have a larger number of links, a phe- 
nomena labeled as preferential attachment |j|,[l3|]. Im- 
plicitly or explicitly, preferential attachment is part of 
all network models that aim to explain the emergence of 
the inhomogeneous network structure and power law con- 
nectivity distribution |^-^|. The availability of dynamic 
data on the network development allows us to investigate 
its presence in the co-authorship network. For this net- 
work preferential attachment appears at two levels, that 
we discuss separately. 

(i) New nodes: For a new author, that appears for the 
first time on a publication, preferential attachment has 
a simple meaning: it is more likely that the first paper 
will be co-authored with somebody that already has a 
large number of co-authors (links) that with somebody 
less connected. As a result "old" authors with more links 
will increase their number of co-authors at a higher rate 
than those with fever links. To investigate this process 
in quantitative terms we determined the probability that 
an old author with connectivity fc is selected by a new 
author for co-authorship. This probability defines the 
II(fc) distribution function. Calling "old authors" those 
present up to the last year, and "new author" those who 
were added during the last year, we determine the change 
in the number of links, Afc, for an old author that at the 
beginning of the last year had fc links. Plotting Afc as a 
function of fc gives the function II(fc), describing the na- 
ture of preferential attachment. Since the measurements 
are limited to only a finite (AT = 1 year) interval, we 
improve the statistics by plotting the integral of II(fc): 



Av(fc) 



U(k')dk'. 



(1) 



If preferential attachment is absent, II(fc) should be in- 
dependent of fc, as each node grows independently of it's 
degree, and n{k) is expected to be linear. As Fig. ^ shows, 
we find that /{(fc) is nonlinear, increasing as n{k) ~ k u+1 , 
where the best fit gives v ~ 0.8 for M and v ~ 0.75 for 
NS. This implies that II(fc) ~ k v , where v is different 
from 1 [EHJ. As simulations have shown, such nonlinear 
dependence generates deviations from a power law P(k) 
pl[ . This was supported by analytical calculations ||, 
that demonstrated that the degree distribution follows 
a power law only for v = 1. The consequence of this 
nonlinearity will be discussed below. 
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FIG. 7. Cumulated preferential attachment (n.(k)) of in- 
coming new nodes for the M and NS database. Results com- 
puted by considering the new nodes coming in the specified 
year, and the network formed by nodes already present up to 
this year. In the absence of preferential attachment «(fe) ~ k, 
shown as continuous line on the figures. 



(ii) Internal links: A large number of new links ap- 
pear between old nodes as the network evolves, repre- 
senting papers written by authors that were part of the 
network, but did not collaborate before. Such internal 
links are known to effect both the topology and dynamics 
of the network ||. These internal links are also subject 
to preferential attachment. We studied the probability 
II(fci,fc2) that an old author with k\ links forms a new 
link with another old author with fc2 links. The II(fci , k-i) 
probability map can be calculated by dividing N(ki, fc2), 
the number of new links between authors with fci and fc2 
links, with the -D(fci,fc2), number of pairs of nodes with 
connectivities fci and fc2 present in the system: 



n(fci,fc 2 ) 



A(fc 1; fc 2 ) 
D{k 1 ,k 2 )' 



(2) 



The three dimensional plot of II(fci, hi) is shown in Fig||, 
the overall behavior indicating preferential attachment: 
II(fci,fc2) increases with as either k\ or A^'s increase. 




FIG. 8. Internal preferential attachment for the M and NS 
database, 3D plots: Afc as a function of fci and fe- Results 
computed on the cumulative data in the last considered year. 



A natural hypothesis is to assume that II(fci,fc2) fac- 
torizes into the product fcifc2. As Fig. |^ shows, we indeed 
find that 



G 



K(kik 2 ) = 



k i k'2 



(3) 



can be well approximated with a slope 2 as a function 
of k\ k,2 , indicating that for internal links the preferential 
attachment is linear in the degree. 
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FIG. 9. Cumulated internal preferential attachment (fc(fc)) 
for the M and NS database, scaling as a function of the k\k2 
product. Results computed on the cumulative data in the last 
considered year. The straight lines have slope 1, expected for 
Hi(k 1 k2) if there would be no preferential attachment. 



IV. MODELING THE WEB OF SCIENCE 

In this section we use the obtained numerical results 
to construct a simple model for the evolution of the co- 
authorship network. It is important to emphasize that 
the purpose of the model is to capture the main mech- 
anisms that affect the evolution and the scaling of the 
network, and not to incorporate every numerical detail 
of the measured web. However, the advantage of the pro- 
posed model is its flexibility: features, neglected here, can 
be incorporated into the current modeling framework. 

We denote by ki{t) the number of links node i has at 
time t; by T(t) and N(t) the total number of links and 
total number of nodes at time t, respectively. We assume 
that all nodes present in the system are active, i.e. they 
can author further papers. This is a reasonable assump- 
tion as the time-span over which data is available to us 
is shorter than the professional lifetime of a scientist. In 
agreement with Fig. [j], we consider that new researchers 
join the field at a constant rate, leading to 



N(t) = fit. 



(4) 



The average number of links per node in the system at 
time t is thus given by: 



< k > = 



T(t) 

N(ty 



(5) 



Fig. H suggests, that the probability to create a new inter- 
nal link between two existing nodes is proportional with 
the product of their connectivities. Consequently, denot- 
ing by a the number of newly created internal links per 



node in unit time, we write the probability that between 
node i and j a new internal link is created as 



n« = 



Es,m kukri 



N(t) a, 



(0) 



where the prime sign indicates that the summation is 
done for s ^ m values. 

The measurements also indicated (Fig. ^) that new 
nodes link to the existing nodes with preferential attach- 
ment, II(fc) follows k u with v ~ 0.75 — 0.8. Aiming to 
obtain an analytically solvable model, at this point we 
neglect this nonlinearity and we approximate II (fc) with 
a linear k dependence. T he e ffect of the nonlinearities 
will be discussed in sect. VII. Thus, if node i has fc. 



links, the probability that an incoming node will connect 
to it is given by 



IF- 



(7) 



where b is the average number of new links that an in- 
coming node creates. 

We have thus formulated the dynamical rules that 
govern our evolving network model, capturing the basic 
mechanism governing the evolution of the co-authorship 
network: 

1. Nodes join the network at a constant rate. 

2. Incoming nodes link to the already present nodes 
following preferential attachment (pp. 

3. Nodes already present in the network form new in- 
ternal links following preferential attachment (||). 

4. We neglect the aging of nodes, and assume that all 
nodes and links present in the system are active, 
able to initiate and receive new links. 

In the model we assume that the number of authors 
on a paper is constant. In reality m is a stochastic vari- 
able, as the number of authors varies from paper to pa- 
per. However, for the scale-free model the exponent 7 is 
known to be independent of m, thus making m a stochas- 
tic variable is not expected to change the scaling behav- 
ior. 



V. CONTINUUM THEORY 

Taking into account that new links join the system 
with a constant rate, j3, the continuum equation for the 
evolution of the number of links node i has can be written 
as: 



dki 
~dt 



bj3kj 
E,- k 3 



kj k 7 



(8) 



The first term on the right hand side describes the contri- 
bution due to new nodes (Q) and the second term gives 



7 



the new links created with already existing nodes (g). 
The total number of links at time t can be computed 
taking into account the internal and external preferential 
attachment rules: 

J2h = T(t)= [ 2[N(t')a + bf3}dt' = t(3(at + 2b). 
Jo 



(9) 



Consequently the average number of links per node in- 
creases linearly in time, 



< k > = at + 2b, 



(10) 



in agreement with our measurements on the collabora- 
tion network (Fig. ^|). The master equation (g) can be 
solved if we approximate the double sum in the second 
term. Taking into account that we are interested in the 
asymptotic limit where the total number of nodes is large 
relative to the connectivity of the nodes, we can write: 



k s k m — J]] k s k m k m w I ki J . (11) 

s,m s m m \ i / 

We have used here the fact that T{t) 2 depends on TV 2 , 
while kf depends only linearly on N (we investigate 
the N — > oo limit). Using (|Tl|) equation (@) now becomes: 



dki 
~dt 



bki 



kjd 



t(at + 2b) at + 2b' 
Introducing the notation a — a/b, we obtain: 
h ta + 1 



dki 
~~dt 



t ta + 2' 



(12) 



(13) 



This differential equation is separable, the general solu- 
tion having the form 



hit) = d VtV2 + at. 



(14) 



The C'i integration constant can be determined from the 
initial conditions for node i. Since node i joins the system 
at time i,, we have fef(i») = b, leading to 



(15) 




This implies that for large times (t — > oo) the connectiv- 
ity of the node scales linearly with time, i.e. k(t) ~ t. 

A quantity of major interest is the degree distribution, 
P(k). The nodes join the system randomly at a constant 
rate, which implies that the tj values are uniformly dis- 
tributed in time between and t. The distribution func- 
tion for the U in the [0, t] interval is simply p(t) = 1/t. 
Using (|l5|), P(k) can be obtained after determining the 
U(ki) dependence from (|l5|), giving 



P(k) = -p(t) 



dtj 
dh 



b 2 (2/a + t)- 



k 2 ^k 2 + bH{2 + at) ' 



(16) 
(17) 



An immediate consequence of this result is that the con- 
nectivity distribution depends both on the observation 
time t and on the range of k values we explore. In the 
asymptotic limit t — > oo we obtain 



P{k) oc 



1 



(18) 



predicting a scale- free behavior with exponent 7 = 2. 
At short times, however, the exponent is different, the 
network exhibiting a scale-free behavior similar to the 
scale- free model 0,Ol: 



P(k) oc 



1 



(19) 



Thus the model predicts that the degree distribution of 
the collaboration network displays a crossover between 
two scaling regimes. In general, scaling is controlled by 
the time dependent crossover connectivity, given by 



k c = ^b 2 t(2 + at). 



(20) 



For k -C k c the degree distribution scales with an ex- 
ponent 7 = 2, while for k 3> k c the degree distribution 
scales with 7 = 3. The crossover connectivity, k c , in- 
creases linearly in time for t 3> 2 /a, which implies that 
in the asymptotic limit (t —> 00) only the 7 = 2 exponent 
is observable. 

Note that this result predicts that the degree distribu- 
tion has two scaling regimes, one with 7 = 2 for small 
k, followed by a crossover to 7 = 3 for large k. This 
crossover towards a larger exponent can be easily ap- 
proximated with an exponential cutoff, which is why we 
believe that in ||ici| the power law with an exponential 
cutoff gave a reasonable fit. However, as jll| and our 
results show, for datasets with better statistics the scal- 
ing regimes can be distinguished. Indeed, the crossover 
is visible in Fig. || as well, in particular for the degree 
distribution of NS. The degree distribution taken in 1993 
has a clear 7 = 3 tail, as for the studied short time-frame 
(3 years) k c is expected to be low. This 7 = 3 tail all 
but disappears, however, in 1998, being replaced with a 
7 = 2 exponent, as predicted by ( |l8| ) for the limit t — ► 00. 
The M database shows similar characteristics, albeit the 
crossover is masked by a higher spread in the data point 
thanks to the weaker statistics. 

Plotting instead of P(k) two differently cumulated val- 
ues, the 7 = 2 and 7 = 3 scaling regimes are more ev- 
ident. Let us denote by F(k) the primitive function of 
P(k), defining: 



$(fc) = -F(l) - / P{k')dk' 



(21) 
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<!>(fc) can be determined numerically by integrating P(k) 
between 1 and k and subtracting the constant at which 
the integral converges. For small k the function $(fc) 
should scale as 



(22) 



$(fc) oc k~ 



assuming that Pik) scales as given by (|18|). As Fig. |1£ 
shows, we indeed find that for large t (1998) the measured 
$(fc) function converges to a k~ l behavior, which is less 
apparent on the small t curves (1993 and 1995). 

To investigate the large k behavior of P(k) we mea- 
sured the r(fc) function defined as: 



r(fc) 



P{k')dk', 



(23) 



which captures the scaling of the tail. According to jl9| ) 
for large k and small t one should observe 



r(fc) oc k 



(24) 



As Fig. 10 shows, we indeed find that for NS for small 
t (1993) the large k scaling follows the prediction (23), 
and, as predicted, the scaling increasingly deviates from 
it as time increases. 





FIG. 10. Scaling of $(fc) (a) and of r(fc) (b) for the NS 
database, demonstrating the trends in the small and large k 
behavior of the degree distribution (see text). 



increased the parameter a in Eq. (^]). For compari- 
son purposes we note that in the real system we have 
ffM = 0.31/year ~ 10~ 4 /simulation step and ons = 
0.98/year ~ 3.684 • lCP 5 /simulation step, numbers that 
can be derived from the data shown in Fig. || and Fig. [l]b. 

The advantage of the modeling efforts, including the 
Monte Carlo simulations, is that they reproduce the net- 
work dynamics from the very first node. In contrast, the 
database we studied records nodes and links only after 
1991, when much of the networks structure was already in 
place. By collecting data over several years we gradually 
discovered the underlying structure. We expect that af- 
ter a quite long measurement time the structure revealed 
by the collected data will be statistically indistinguish- 
able from the full collaboration network. However, the 
dynamics we measure during this process for the relevant 
quantities (diameter, average connectivity, clustering co- 
efficient) might differ from those characterizing the full 
network, since all of them are computed on the incom- 
plete network (revealed by the available data). However, 
Monte Carlo simulations allow us to investigate the ef- 
fect of the data incompleteness on the relevant network 
measures. 

We investigated the time dependence of the average 
connectivity, the diameter and the clustering coefficient, 
using the parameters N max = 1000, a = 0.001, (3 = 1 
and b = 2. In order to increase the statistics, the results 
were averaged over 10 independent configurations. 

Average degree: As Fig. [ll] indicates, asymptotically 
the average connectivity increases linearly, in agreement 
with both our measurements (see Fig. ^) and the contin- 
uum theory (see Eq. (|J)). 



VI. MONTE CARLO SIMULATIONS 



While the continuum theory discussed in the previous 
section predicts the connectivity distribution in agree- 
ment with the empirical data, there are other quanti- 
ties, such as the node separation and clustering coeffi- 
cient, that cannot be calculated using this method at this 
point. To investigate the behavior of these measures of 
the network topology next we study the model proposed 
in Sect. IV using Monte Carlo simulations. 

Due to memory and computing time limitations we 
investigated relatively small networks, with total num- 
ber of nodes N < 4000. While these networks are con- 
siderably smaller than the real networks, their scaling 
and topological features should be representative. In or- 
der to form a reasonable number of internal links, we 



FIG. 11. Computer simulated dynamics of the average con- 
nectivity in the proposed model. (N max — 1000, a = 0.001, 
(3 = 1 and b = 2) 



Average separation: The empirical results indicated 
(see Fig. ||) that the average separation decreases with 
time for both databases. In contrast, our simulations 
show a monotonically increasing d, in apparent disagree- 
ment with the real system. 
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FIG. 12. Computer simulated dynamics for the real and ap- 
parently measured diameter value. {N max = 1000, a = 0.001, 
P = 1, 6 = 2 and N s = 200) 

Note that an increasing diameter agrees with measure- 
ments done on other models, including scale-free and 
exponential networks, that all predict an approximately 
logarithmic increase with the number of nodes, d oc ln(A) 
[^8|,^2|. This contradiction between the models and our 
empirical data is rooted in the incomplete data we have 
for the first years of our measurements. To show this we 
perform the following simulation. We construct a net- 
work of A = 1000 nodes. However, we will record the 
apparent diameter of the network made of nodes that 
have been added only after a predefined time, mimick- 
ing the fact that the data available for us gives d only 
for publications after 1991. We find that the separation 
of this incomplete network has a decrea sing tendency, 
slowly converging to the real value (Fig. |l^), in agree- 
ment with the decrease observed in the empirical mea- 
surements (Fig. ||). This result underlies the importance 
of simulations in understanding the dynamics of complex 
networks, and resolves the conflict between the simula- 
tion and the empirical data. It also indicates that most 
likely the diameter of the M and NS database does in- 
crease in time, but such increase can be observed only if 
much longer time intervals will be available for study. 
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FIG. 13. Clustering coefficient for different values of the a 
parameter as a function of the system size N. (N max — 1000, 
(3 = 1 and b = 2, values of a are (•), 0.00025 (+), 
0.0005 (A), 0.00075 (*), 0.002 (v)-) The inset shows the 
scaling of the N m i n value as a function of the a parame- 
ter. {Nmax = 1000, /3 = 1 and 6 = 2, the line shows a fit 
\nN min = -1.887- 1.144 -In a.) 

Clustering coefficient. The clustering coefficient pre- 
dicted by our simulations is shown on Fig. [t3]. As the 
figure indicates, C depends strongly on the value of the 
parameter a. For a — we have essentially the scale- 
free model H and the clustering coefficient has a mono- 
tonically decreasing tendency. For a > however, the 
clustering coefficient decreases at the beginning and after 
reaching a minimum at N m i n changes its course, asymp- 
totically increasing with time. Thus, for all a > 0, we 
expect that in the asymptotic limit the clustering coef- 
ficient should increase, in agreement with our measure- 
ments on the collaboration network (see Fig. The 
N m in position where the clustering coefficient has a min- 
imum scales as power of the a parameter, as shown as 
the inset in Fig. [L3|. 

We conclude thus that the decreasing C observed for 
our database, shown in Fig. ||, does not represent the 
asymptotic behavior. The observed behavior also indi- 
cates that one should view the values for C reported in 
the literature, and measured for finite time-frames (max- 
imum 5 years) with caution, as they might not represent 
asymptotic values. 

Degree distribution: The simulations provide P(k) as 
well, allowing us to check the validity of the predictions of 
the continuum theory. Although the considered system 
sizes are rather small (N max = 3500) compared to the 
N — > oo approximation used in the analytical calculation 
and the N M = 70, 975, A^vs = 209, 750 for the empirical 
data, the behavior of P(k), shown in Fig. |l4] agrees with 
our continuum model and measurements. For small k 
we observe the 7 = 2 scaling, while for large k P(k) 
converges to the predicted 7 = 3 exponent. 
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FIG. 14. Connectivity distributions as predicted by numer- 
ical simulation for different stages of evolution of the network 
(a = 0.001, (3 = 1 and b = 2). 



VII. NONLINEAR EFFECTS 

An issue that remained unresolved up to this pont con- 
cerns the effect of the nonl inear preferential attachment. 
We have seen in sect. [II F that for the incoming links we 
have 



IL 



kf 



(25) 



with v 0.8. On the other hand, for such preferential at- 
tachment Krapivsky et al have shown that the degree dis- 
tribution follows a stretched exponential, i.e. the power 
law is absent [Hj. This would indicate that P(k) for the 
co-authorship network should follow a stretched expo- 
nential, which disagrees with our and Newman's findings 
(we have explicitly checked that a stretched exponential 
is not a good fit for our data). What could then override 
the known effect of the v < 1 nonlinear behavior? Next 
we propose a possible explanation: the linearity of the in- 
ternal preferential attachment can restore the power law 
nature of P(k). 

For non integer v values the differential equation (|^) 
governing the evolution of the connectivity is not ana- 
lytically solvable. However, in the extreme case v = 
(no preferential attachment for new nodes) the equation 
is again analytically tractable. Equation (0) in this case 
has the form 



dki 
~dt 



bj3 
N(t) 



(<)<*£ 



'"7 



Es,m kskr, 



(26) 



Using N(t) — (3t and < k > = at + 2b, which are valid in 
this case as well, following the steps described in sect. |y|, 
we obtain the differential equation: 



dki 
~dt 



a ki 



at + 2b 

The general solution of this equation has the form: 



(27) 



fci(t) = (26 + at) d + - (2b + at) '. 



t 



2b + at 



(28) 



where Cj is an integration constant which can be deter- 
mined using the ki(U) = b initial condition. We thus 
obtain: 



ki(t) 



2b + at 1 



2b + ati 



(2b + at) log 



t(2b + aU) 



U(2b + at) 



(29) 



The degree distribution cannot be determined analyti- 
cally, since the U(ki) function is not analytical. How- 
ever, taking the {t,ti\ — > oo limit, i.e. focusing on the 
network's long time evolution we obtain 



h(t) 



U 



(30) 



which again predicts a power-law degree distribution: 



P(k) oc 



1 



(31) 



Consequently, we obtain that in the asymptotic limit 
for v = the scale-free degree distribution has the same 
tail as we obtained for v = 1. This result suggests that 
the linearity in the internal preferential attachment de- 
termines the asymptotic form of the degree distribution. 
The real exponent v = 0.8 is between the two asymp- 
totically solvable cases v = and v = 1, but, based on 
the limiting behavior of the two extremes we expect that 
independently of the value of < v < 1, in the asymp- 
totic limit the degree distribution should converge to a 
power-law with 7 = 2. On the other hand, we expect 
that the nonlinear v ^ 1 behavior would have a consider- 
able effect on the non-asymptotic behavior, which is not 
accessible analytically at this point. 

To test further the potential effect of the nonlincarities, 
we have simulated the model discussed in sect. VI with 



v = 0.75, otherwise all parameters being unchanged. We 
show on Fig. [l^ the degree distribution for the linear 
(v = 1) and the nonlinear (v — 0.75) case. 



t linear preferential attachment 
nonlinear preferential attachment 
slope -3 




FIG. 15. Connectivity distribution generated by the nu- 
merical simulations for linear (v — 1) and nonlinear (v = 0.75) 
preferential attachment (N max — 3500, a = 0.0005, /3 = 1 and 
6 = 2). 
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As one can see, the v = 1 and v — 0.75 case can 
be hardly distinguished. This could have two origins. 
First, the simulations are limited to t = 3500 simula- 
tion steps, due to the discussed running time limitations. 
Thus we are hardly in the asymptotic regime. On the 
other hand, the agreement indicates that the nonlinear 
effect has hardly distinguishable effect on P(k), again the 
internal attachment dominating the system behavior. 

In summary, the domination of the internal attachment 
affects are expected to be even more dominant for the real 
network. Indeed, in the collaboration network the frac- 
tion of the links created as internal links is much higher 
than those created by the incoming nodes, as an author 
qualifies for a new incoming link only on his first paper. 
Most scientists contribute for a considerable time to the 
same filed, publishing numerous subsequent papers, and 
these later links will all appear as internal links. Thus 
typically the number of internal links is much higher than 
the number of new links, the network's topology is much 
more driven by the internal links then by the external 
ones. This is one possible reason why the effect of the 
nonlinear behavior, while clearly present, cannot be de- 
tected in the functional form of P(k). 

VIII. DISCUSSION 

In the last two years we witnessed considerable ad- 
vances in addressing the topology and the dynamics of 
complex networks. Along this road a number of quanti- 
ties have been measured and calculated, aiming to char- 
acterize the network topology. However most of these 
studies are fragmented, focusing on one or a few charac- 
teristics of the network at a time. Here we presented a 
detailed study of a network of high interest to the scien- 
tific community, the collaboration network of scientists, 
which also represents a prototype example of a complex 
evolving network. This study allows us to investigate to 
which degree can we use various known measures to char- 
acterize a given network. A first and important result of 
our investigation is that we need to be careful at distin- 
guishing between the asymptotic and the intermediate 
behavior. In particular, most quantities used to charac- 
terize the network are time dependent. For example, the 
diameter, the clustering coefficient, as well as the aver- 
age degree of the nodes are often used as basic time in- 
dependent network characteristics. Our empirical results 
show that many of these key quantities are time depen- 
dent, without a tendency to saturate within the available 
time-frame. Thus their value at a given moment tells us 
little about the network. They can be used, however, at 
any moment, to show that the network has small world 
properties, i.e. it has a small average separation, and a 
clustering coefficient that is larger than one expected for 
a random network. 

A quantity that is often believed to offer a stationary 
measure of the network is the degree distribution. Our 



empirical data, together with the analytic solution of the 
model shows that this is true only asymptotically for the 
co-authorship network: we uncover a crossover behavior 
between two different scaling regimes. We tend to believe 
that the model's predictions are not limited to the col- 
laboration network: as on the WWW and for the actor 
collaboration network similar basic processes take place, 
chances are that similar crossovers could appear there as 
well. 

A third important conclusion of the study regards the 
understanding that the measurements done on incom- 
plete databases could offer trends that are opposite com- 
pared to that seen in the full system. An example is the 
node separation: we find that the empirically observed 
decreasing tendency is an artifact of the incomplete data. 
However, our simulations show that one can, with care- 
ful modeling, uncover such inconsistencies. But this also 
offers an important warning: for any network, before at- 
tempting to model it, we need to fully understand the 
limitations of the data collection process, and test their 
effect on the quantities of interest for us. 

The model presented here represents only the starting 
point toward a complete modeling of collaborations in 
science. As we discussed thorough the paper, we have 
made several important approximations, sacrificing cer- 
tain known network features for an analytical solution. 
For example, we neglected in our modeling effort the po- 
tential effect of aging |||l2| , reflecting the fact that sci- 
entists retire and stop publishing papers. In the long 
run such aging effects will, undoubtedly, introduce expo- 
nential cutoffs in P(k), as there are inherent limits on 
how many papers a researcher can write. Those effects, 
however, are not visible in our datasets. There are sev- 
eral potential reasons for this. Probably the most impor- 
tant is that even the eight years available to us for study 
is much shorter than the professional life of a scientist. 
Such aging induced cutoffs are expected to be visible only 
when time- frames of length of several time the scientist's 
professional life are studied. Data availability so far does 
not permit such studies. 

A second simplification is that we assumed that each 
paper has exactly m authors. That is far from being so, 
as the numbers of co-authors varies greatly between pa- 
pers. However, it is hard to imagine that the inclusion of 
a stochastic component in m would fundamentally affect 
our results. It is clear that such stochastic component 
will not affect P(k), and we feel that the effect on d or 
C is also negligible, but we lack at this point results to 
support this latter claim. 

A surprising result of our study is the power law char- 
acter of P{k), despite the fact that n(fc) is nonlinear. We 
have shown that the existence of a linear internal attach- 
ment rule is able to restore the power law P(k). Con- 
sidering the fact that the largest fraction of links appear 
as internal links, compared with links created by new 
authors, it is fair to expect that the scaling determined 
by this internal linking process will dominate. The fact 
that for the two limits of the internal linking exponents, 
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v = and v = 1, we obtained power law P(k) despite 
the nonlinear external II(fc), suggests that such power 
law might appear for nonlinear, v ^ 0, 1 internal II(fc) as 
well. Solving this problem is a formidable challenge, but 
it is perhaps worth the effort. 

Finally, a more detailed modeling of the co-authorship 
network would involve the construction of bipartite 
graphs J33|, in which we directly simulate the publish- 
ing of papers by several co-authors, which are all con- 
nected to each other. In such a model the basic unit is 
a paper, that involves several "old" and "new" authors. 
In such a framework one can simultaneously study the 
evolution of the co-authorship network (in which nodes 
are scientists linked by joint publications) and the pub- 
lication network (in which nodes are papers linked by 
joint authors). One can imagine that coupled continuum 
equations could be formulated for such bipartite network 
as well, which would eventually predict the network's dy- 
namics and topology. Undoubtedly including such detail 
in the modeling effort would increase the fidelity of the 
model. While challenging, following such path is beyond 
our goals here. 

In summary, the modeling efforts presented here are 
only the starting point for a systematic investigation of 
the evolution of social networks. It is important to note 
that such modeling is open ended: more details can be 
incorporated, that could undoubtedly improve the agree- 
ment between the empirical data and theory. And such 
improvements might not be in vain: they could point 
towards a better understanding of the evolution of not 
only the co-authorship graph, but complex networks in 
general. 
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Figure Captions: 

Fig. 1 (a) Cumulative number of papers for the M and 
NS databases in the period 1991-98. The inset shows the 
number of papers published each year, (b) Cumulative 
number of authors (nodes) for the M and NS databases 
in the period 1991-98. The inset shows the number of 
new authors added each year. 

Fig. 2 Degree distribution for the (a) M and (b) NS 
database, showing the data based on the cumulative re- 
sults up to yeas 1993 (x) and 1998 (•). (c) Degree dis- 
tribution shown with logarithmic binning computed from 
the full dataset cumulative up to 1998. The lines corre- 
spond do the best fits, and have the slope 2.1 (NS, dotted) 
and 2.4 (M, dashed). 

Fig. 3 Average separation in the M and NS databases. 
The separation is computed on the cumulative data up 
to the indicated year. 

Fig. 4 Clustering coefficient of the M and NS database, 
determined for the cumulative data up to the year indi- 
cated on the t axis. 

Fig. 5 Relative size of the largest cluster for the M and 
NS database. Results are computed on the cumulative 
data up to the given year. 

Fig. 6 Average number of links per node (< k >) for 
the M and NS database. Results are computed on the 
cumulative data up to the given year. 

Fig 7 Cumulated preferential attachment («(&)) of in- 
coming new nodes for the M and NS database. Results 
computed by considering the new nodes coming in the 
specified year, and the network formed by nodes already 
present up to this year. In the absence of preferential 
attachment n(k) ~ k, shown as continuous line on the 
figures. 

Fig. 8 Internal preferential attachment for the M and 
NS database, 3D plots: Afc as a function of k\ and k-x. 
Results computed on the cumulative data in the last con- 
sidered year. 

Fig. 9 Cumulated internal preferential attachment 
(n(k)) for the M and NS database, scaling as a function 
of the k\ki product. Results computed on the cumula- 
tive data in the last considered year. The straight lines 
have slope 1, expected for nikxk'i) if there would be no 
preferential attachment. 

Fig. 10 Scaling of $(fc) (a) and of r(k) (b) for the 
NS database, demonstrating the trends in the small and 
large k behavior of the degree distribution (see text). 

Fig. 11 Computer simulated dynamics of the average 
connectivity in the proposed model. (N max = 1000, a = 



0.001, (3 = 1 and b = 2) 

Fig. 12 Computer simulated dynamics for the real and 
apparently measured diameter value. (N max = 1000, 
a = 0.001, (3 = 1, b = 2 and N s = 200) 

Fig. 13 Clustering coefficient for different values of 
the a parameter as a function of the system size N. 
{N max — 1000, (3 = 1 and b — 2, values of a are (•), 
0.00025 (+), 0.0005 (A), 0.00075 (*), 0.002 (y).) The 
inset shows the scaling of the N min value as a function 
of the a parameter. (N max = 1000, (3 = 1 and 6 = 2, the 
line shows a fit lnN m i n = —1.887 — 1.144 • lna. 

Fig. 14 Connectivity distributions as predicted by nu- 
merical simulation for different stages of evolution of the 
network (a = 0.001, (3 = 1 and b = 2). 

Fig. 15 Connectivity distribution generated by the 
numerical simulations for linear (y = 1) and nonlin- 
ear {v = 0.75) preferential attachment (N max = 3500, 
a = 0.0005, (3 = 1 and b = 2). 
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